Research Interests

Hardware Prefetching and Computer Architecture

I am relatively far away (although not so much so) from something of a completed manuscript for my thesis, so here is a collection of thoughts that provide a general direction for my intents.

A way to quantify prefetching is to compare existing hardware prefetchers to some magical oracle prefetcher. You can do this by analyzing the specific prefetch streams generated by said prefetchers. When you compare these streams, you can generally separate them into four sets:

It is easy to separate Sets 1, 2, and 3+4, but it's much harder to separate Set 3 from Set 4, which is the set we really care about, since Set 3 contains hard-to-make prefetches that can guide future prefetcher design.

The first-order concern is to actually create motivation to do this research. That is, is Set 3+4 actually large enough to justify putting effort into discovering its contents? There are multiple ways to do this, none of which seem (at least to me) particularly more correct than the others. The main point here is to quantify the difference between oracle streams and other prefetcher streams.

One way to do this is via compression, as a proxy for finding the Kolmogorov complexity of the streams. If you can compress a file well, it means there is less entropy in the file. I should add that the point of needing to calculate the entropy/k-complexity is that if a stream has a high amount of entropy, then it is naturally unpredictable. If the oracle stream had high entropy, it would mean the oracle is making seemingly random or unpredictable decisions on what to prefetch using its prescient information. However, according to my data so far, this is not the case—the oracle stream generally has one of the lower entropies across many benchmarks. This means the oracle is actually making relatively more patterned decisions than existing prefetchers. I am currently at this stage, which is to say mainly data collection and analysis.

Next comes the much harder part of actually separating Set 3 from Set 4. Again, there are multiple ways to do this:

Ultimately, I suspect what's going to happen is some combination of multiple of these methods (conservatively) acting as filters that generates some subset of Set 3.